62 research outputs found

    Beyond position weight matrices: nucleotide correlations in transcription factor binding sites and their description

    Full text link
    The identification of transcription factor binding sites (TFBSs) on genomic DNA is of crucial importance for understanding and predicting regulatory elements in gene networks. TFBS motifs are commonly described by Position Weight Matrices (PWMs), in which each DNA base pair independently contributes to the transcription factor (TF) binding, despite mounting evidence of interdependence between base pairs positions. The recent availability of genome-wide data on TF-bound DNA regions offers the possibility to revisit this question in detail for TF binding {\em in vivo}. Here, we use available fly and mouse ChIPseq data, and show that the independent model generally does not reproduce the observed statistics of TFBS, generalizing previous observations. We further show that TFBS description and predictability can be systematically improved by taking into account pairwise correlations in the TFBS via the principle of maximum entropy. The resulting pairwise interaction model is formally equivalent to the disordered Potts models of statistical mechanics and it generalizes previous approaches to interdependent positions. Its structure allows for co-variation of two or more base pairs, as well as secondary motifs. Although models consisting of mixtures of PWMs also have this last feature, we show that pairwise interaction models outperform them. The significant pairwise interactions are found to be sparse and found dominantly between consecutive base pairs. Finally, the use of a pairwise interaction model for the identification of TFBSs is shown to give significantly different predictions than a model based on independent positions

    Uncovering the fragility of large-scale engineering project networks

    Full text link
    Engineering projects are notoriously hard to complete on-time, with project delays often theorised to propagate across interdependent activities. Here, we use a novel dataset consisting of activity networks from 14 diverse, large-scale engineering projects to uncover network properties that impact timely project completion. We provide the first empirical evidence of the infectious nature of activity deviations, where perturbations in the delivery of a single activity can impact up to 4 activities downstream, leading to large perturbation cascades. We further show that perturbation clustering significantly affects project overall delays. Finally, we find that poorly performing projects have their highest perturbations in high reach nodes, which can lead to largest cascades, while well performing projects have perturbations in low reach nodes, resulting in localised cascades. Altogether, these findings pave the way for a network-science framework that can materially enhance the delivery of large-scale engineering projects.Comment: 13 pages, 3 figures, 7 supplementary figure

    Quantifying the rise and fall of scientific fields

    Full text link
    Science advances by pushing the boundaries of the adjacent possible. While the global scientific enterprise grows at an exponential pace, at the mesoscopic level the exploration and exploitation of research ideas is reflected through the rise and fall of research fields. The empirical literature has largely studied such dynamics on a case-by-case basis, with a focus on explaining how and why communities of knowledge production evolve. Although fields rise and fall on different temporal and population scales, they are generally argued to pass through a common set of evolutionary stages. To understand the social processes that drive these stages beyond case studies, we need a way to quantify and compare different fields on the same terms. In this paper we develop techniques for identifying scale-invariant patterns in the evolution of scientific fields, and demonstrate their usefulness using 1.5 million preprints from the arXiv repository covering 175 research fields spanning Physics, Mathematics, Computer Science, Quantitative Biology and Quantitative Finance. We show that fields consistently follows a rise and fall pattern captured by a two parameters right-tailed Gumbel temporal distribution. We introduce a field-specific rescaled time and explore the generic properties shared by articles and authors at the creation, adoption, peak, and decay evolutionary phases. We find that the early phase of a field is characterized by the mixing of cognitively distant fields by small teams of interdisciplinary authors, while late phases exhibit the role of specialized, large teams building on the previous works in the field. This method provides foundations to quantitatively explore the generic patterns underlying the evolution of research fields in science, with general implications in innovation studies.Comment: 18 pages, 4 figures, 8 SI figure

    Six Homeoproteins and a Iinc-RNA at the Fast MYH Locus Lock Fast Myofiber Terminal Phenotype

    Get PDF
    International audienceThousands of long intergenic non-coding RNAs (lincRNAs) are encoded by the mammalian genome. However, the function of most of these lincRNAs has not been identified in vivo. Here, we demonstrate a role for a novel lincRNA, linc-MYH, in adult fast-type myofiber specialization. Fast myosin heavy chain (MYH) genes and linc-MYH share a common enhancer, located in the fast MYH gene locus and regulated by Six1 homeoproteins. linc-MYH in nuclei of fast-type myofibers prevents slow-type and enhances fast-type gene expression. Functional fast-sarcomeric unit formation is achieved by the coordinate expression of fast MYHs and linc-MYH, under the control of a common Six-bound enhancer

    Six1 homeoprotein drives myofiber type IIA specialization in soleus muscle

    Get PDF
    International audienceAbstractBackgroundAdult skeletal muscles are composed of slow and fast myofiber subtypes which each express selective genes required for their specific contractile and metabolic activity. Six homeoproteins are transcription factors regulating muscle cell fate through activation of myogenic regulatory factors and driving fast-type gene expression during embryogenesis.ResultsWe show here that Six1 protein accumulates more robustly in the nuclei of adult fast-type muscles than in adult slow-type muscles, this specific enrichment takes place during perinatal growth. Deletion of Six1 in soleus impaired fast-type myofiber specialization during perinatal development, resulting in a slow phenotype and a complete lack of Myosin heavy chain 2A (MyHCIIA) expression. Global transcriptomic analysis of wild-type and Six1 mutant myofibers identified the gene networks controlled by Six1 in adult soleus muscle. This analysis showed that Six1 is required for the expression of numerous genes encoding fast-type sarcomeric proteins, glycolytic enzymes and controlling intracellular calcium homeostasis. Parvalbumin, a key player of calcium buffering, in particular, is a direct target of Six1 in the adult myofiber.ConclusionsThis analysis revealed that Six1 controls distinct aspects of adult muscle physiology in vivo, and acts as a main determinant of fast-fiber type acquisition and maintenance

    iGEM: a model system for team science and innovation

    Full text link
    Teams are a primary source of innovation in science and technology. Rather than examining the lone genius, scholarly and policy attention has shifted to understanding how team interactions produce new and useful ideas. Yet the organizational roots of innovation remain unclear, in part because of the limitations of current data. This paper introduces the international Genetically Engineered Machine (iGEM) competition, a model system for studying team science and innovation. By combining digital laboratory notebooks with performance data from 2,406 teams over multiple years of participation, we reveal shared dynamical and organizational patterns across teams and identify features associated with team performance and success. This dataset makes visible organizational behavior that is typically hidden, and thus understudied, creating new opportunities for the science of science and innovation.Comment: 78 pages including SI, 7 figures, 18 SI figure

    Collaboration and Performance of Citizen Science Projects Addressing the Sustainable Development Goals

    Get PDF
    Measuring the progress towards the Sustainable Development Goals (SDGs) requires the collection of relevant and reliable data. To do so, Citizen Science can provide an essential source of non-traditional data for tracking progress towards the SDGs, as well as generate social innovations that enable such progress. At its core, citizen science relies on participatory processes involving the collaboration of stakeholders with diverse standpoints, skills, and backgrounds. The ability to measure these participatory processes is therefore key for the monitoring and evaluation of citizen science projects and to support the decisions of their coordinators. Here, we show that the monitoring of social interaction networks provides unique insights on the participatory processes and outcomes of citizen science projects. We studied fourteen early-stage citizen science projects that participated in an innovation cycle focused on SDG 13, Climate Action, as part of the Crowd4SDG project. We implemented a monitoring strategy to measure the collaborative profiles of citizen science teams. This allowed us to generate dynamic interaction networks across complementary dimensions, making visible both formal and informal interactions associated with the division of labor, collaborations, advice seeking, and communication processes of the projects during their development. Leveraging jury evaluation data, we showed that while team composition and communication are associated with project quality, measures of collaboration and activity are associated with engagement quality. Overall, monitoring social interaction dynamics helps build a more comprehensive picture of participatory processes, which is of importance for guiding citizen science projects and for designing initiatives leveraging citizen science to address the SDGs

    Inducing social self‐sorting in organic cages to tune the shape of the internal cavity

    Get PDF
    Many interesting target guest molecules have low symmetry, yet most methods for synthesising hosts result in highly symmetrical capsules. Methods of generating lower symmetry pores are thus required to maximise the binding affinity in host–guest complexes. Herein, we use mixtures of tetraaldehyde building blocks with cyclohexanediamine to access low-symmetry imine cages. Whether a low-energy cage is isolated can be correctly predicted from the thermodynamic preference observed in computational models. The stability of the observed structures depends on the geometrical match of the aldehyde building blocks. One bent aldehyde stands out as unable to assemble into high-symmetry cages-and the same aldehyde generates low-symmetry socially self-sorted cages when combined with a linear aldehyde. We exploit this finding to synthesise a family of low-symmetry cages containing heteroatoms, illustrating that pores of varying geometries and surface chemistries may be reliably accessed through computational prediction and self-sorting

    Analyse computationnelle des éléments cis-régulateurs dans les génomes des drosophiles et des mammifÚres

    Get PDF
    Cellular differentiation and tissue specification depend in part on the establishment of specific transcriptional programs of gene expression. These programs result from the interpretation of genomic regulatory information by sequence-specific transcription factors (TFs). Decoding this information in sequenced genomes is a key issue. In a first part, we study the interaction between the TFs and the DNA sequences they bind to, called Transcription Factor Binding Sites (TFBSs). Using a Potts model inspired from spin glass physics along with high-throughput binding data for a variety of Drosophilae and mammalian TFs, we show that TFBSs exhibit correlations among nucleotides and that the account of their contribution in the binding energy greatly improves the predictability of genomic TFBSs. Then, we present Imogene, an extension to mammalian genomes of a Bayesian, phylogeny-based algorithm designed to computationally identify the Cis-Regulatory Modules (CRMs) that control gene expression in a set of co-regulated genes, and that was previously applied to Drosophila regulation. Starting with a small number of CRMs in a reference species as a training set, but with no a priori knowledge of the factors acting in trans, the algorithm uses the over-representation and conservation of TFBSs among related species to predict putative regulatory elements along with genomic CRMs underlying co-regulation. We present several applications of this algorithm both in Drosophila and vertebrates. We also present an extension of the algorithm to the case of pattern recognition, showing that CRMs with different patterns of expression can be distinguished on the sole basis of their DNA motifs content. Finally, we present applications of these modeling tools to real biological cases : the trichomes differentiation in Drosophila, and the skeletal muscle differentiation in the mouse. In both cases, predictions were experimentally validated in a joint work with biological teams, and point towards a great flexibility of the cis-regulatory processes.La différenciation cellulaire et la spécification des tissus biologiques dépendent en partie de l'établissement de programmes d'expression génétique caractéristiques. Ces programmes sont le résultat de l'interprétation de l'information génomique par des Facteurs de Transcription (TFs) se fixant à des séquences d'ADN spécifiques. Décoder cette information dans les génomes séquencés est donc un enjeu majeur. Dans une premiÚre partie, nous étudions l'interaction entre les TFs et leurs sites de fixation sur l'ADN. L'utilisation d'un modÚle de Potts inspiré de la physique des verres de spin et de données de fixation à grande échelle pour plusieurs TFs de la drosophile et des mammifÚres permet de montrer que les sites de fixation exhibent des corrélations entre nucléotides. Leur prise en compte permet d'améliorer significativement la prédiction des sites de fixations sur le génome. Nous présentons ensuite Imogene, l'extension au cas des mammifÚres d'un algorithme bayésien utilisant la phylogénie afin d'identifier les motifs et modules de cis-régulation (CRMs) contrÎlant l'expression d'un ensemble de gÚnes co-régulés, qui a précédemment été appliqué au cas de la régulation chez les drosophiles. Partant d'un ensemble d'apprentissage constitué d'un petit nombre de CRMs chez une espÚce de référence, et sans connaissance a priori des TFs s'y fixant, l'algorithme utilise la sur-représentation et la conservation des sites de fixation chez des espÚces proches pour prédire des régulateurs putatifs ainsi que les CRMs génomiques sous-tendant la co-régulation. Nous montrons en particulier qu'Imogene peut distinguer des modules de régulation conduisant à différents motifs d'expression génétique sur la seule base de leur séquence ADN. Enfin, nous présentons des applications de ces outils de modélisation à des cas biologiques réels : la différenciation des trichomes chez la drosophile, et la différenciation musculaire chez la souris. Dans les deux cas, les prédictions ont été validées expérimentalement en collaboration avec des équipes de biologistes, et pointent vers une grande flexibilité des processus de cis-régulation.
    • 

    corecore